Search CORE

8 research outputs found

Runtime address space computation for SDSM systems

Author: Ayguadé Parra Eduard
Balart Tarzan Jairo
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

This paper explores the benefits and limitations of using a inspector/executor approach for Software Distributed Shared Memory (SDSM) systems. The role of the inspector is to obtain a description of the address space accessed during the execution of parallel loops. The information collected by the inspector will enable the runtime to optimize the movement of shared data that will happen during the executor phase. This paper addresses the main issues that have been considered to embed an inspector/executor model in a SDSM system: amount of data collected by the inspector, the accurateness of this data when the loop has data and/or control dependences, and the computational overhead introduced. The paper also includes a description of the SDSM system where the inspector/executor model has been embedded. The proposal is evaluated with four applications from the NAS benchmark suite. The evaluation shows that the accuracy of the inspection and the small overheads introduced by the approach allow its use in a SDSM system.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Experiences parallelizing a web server with OpenMP

Author: Ayguadé Parra Eduard
Balart Tarzan Jairo
Duran González Alejandro
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Multi-threaded web servers are typically parallelized by hand using the pthreads library. OpenMP has rarely been used to parallelize such kind of applications, although we foresee that it can be a great tool for network servers developers. In this paper we compare how easy is to parallelize the Boa web server using OpenMP, compared to a pthreads parallelization, and the performance achieved. We present the results of a parallelization based on OpenMP 2.0, the dynamic sections model and pthreads.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Runtime-guided management of stacked DRAM memories in task parallel programs

Author: Balart Jairo
Bauer Michael
Casas Marc
Castillo Emilio
Chandrasekar Kavitha
Grossman Max
Jiang Xiaowei
Khaldi Dounia
Manivannan Madhavan
Mattson Timothy G.
Meswani Mitesh R.
Ramos Sabela
Servat Harald
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited memory capacity is insufficient for modern HPC systems. For this reason, both stacked DRAM and off-chip memories are expected to co-exist in HPC architectures, giving raise to different approaches for architecting the stacked DRAM in the system. This paper proposes a runtime approach to transparently manage stacked DRAM memories in task-based programming models. In this approach the runtime system is in charge of copying the data accessed by the tasks to the stacked DRAM, without any complex hardware support nor modifications to the application code. To mitigate the cost of copying data between the stacked DRAM and the off-chip memory, the proposal includes an optimization to parallelize the copies across idle or additional helper threads. In addition, the runtime system is aware of the reuse pattern of the data accessed by the tasks, and can exploit this information to avoid unworthy copies of data to the stacked DRAM. Results on the Intel Knights Landing processor show that the proposed techniques achieve an average speedup of 14% against the state-of-the-art library to manage the stacked DRAM and 29% against a stacked DRAM architected as a hardware cache.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the European Union’s Horizon 2020 research and innovation programme (grant agreement 779877). M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Runtime Address Space Computation for SDSM Systems

Author: Eduard Ayguadé
Jairo Balart
Jesús Labarta
Marc Gonzàlez
Xavier Martorell
Publication venue
Publication date
Field of study

Abstract. This paper explores the benefits and limitations of using a inspector/executor approach for Software Distributed Shared Memory (SDSM) systems. The role of the inspector is to obtain a description of the address space accessed during the execution of parallel loops. The information collected by the inspector will enable the runtime to optimize the movement of shared data that will happen during the executor phase. This paper addresses the main issues that have been considered to embed an inspector/executor model in a SDSM system: amount of data collected by the inspector, the accurateness of this data when the loop has data and/or control dependences, and the computational overhead introduced. The paper also includes a description of the SDSM system where the inspector/executor model has been embedded. The proposal is evaluated with four applications from the NAS benchmark suite. The evaluation shows that the accuracy of the inspection and the small overheads introduced by the approach allow its use in a SDSM system.

CiteSeerX

Techniques supporting threadprivate in OpenMP

Author: Alex Duran
Eduard Ayguadé
Jairo Balart
Jesús Labarta
Marc Gonzàlez
Roger Ferrer
Xavier Martorell
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for implementing thread-local data. But in fact the ELF binary specification currently supports data sections that become threadprivate by default. ELF naming for such areas is Thread-Local Storage (TLS). Our experiments demonstrate that implementing threadprivate based on the TLS support is very easy, and more efficient. This proposal goes in the same line as the future implementation of OpenMP on the GNU compiler collection. In addition, our experience with the use of threadprivate in OpenMP applications shows that usually it is better to avoid it. This is because threadprivate variables reside in common blocks and they impede the compiler to fully optimize the code. So it is better to keep threadprivate as a temporary technique only to ease porting MPI codes to OpenMP. 1. Introduction an

CiteSeerX

Crossref

Experiences parallelizing a web server with openmp

Author: Eduard Ayguadé
Jairo Balart
Jesús Labarta
Marc Gonzàlez
Ro Duran
Xavier Martorell
Publication venue
Publication date
Field of study

Abstract. Multi–threaded web servers are typically parallelized by hand using the pthreads library. OpenMP has rarely been used to parallelize such kind of applications, although we foresee that it can be a great tool for network servers developers. In this paper we compare how easy is to parallelize the Boa web server using OpenMP, compared to a pthreads parallelization, and the performance achieved. We present the results of a parallelization based on OpenMP 2.0, the dynamic sections model and pthreads.

CiteSeerX

Experiences parallelizing a web server with OpenMP

Author: Ayguadé Parra Eduard
Balart Tarzan Jairo
Duran González Alejandro
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2006
Field of study

Experiences parallelizing a web server with OpenMP

Author: Ayguadé Parra Eduard
Balart Tarzan Jairo
Duran González Alejandro
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Publication venue
Publication date
Field of study

RECERCAT